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Abstract 

Even  though  numerous  algorithms  exist  for  estimating  the  structure  of  a  scene  from  its  video,  the  solutions 
obtained  are  often  of  unacceptable  quality.  To  overcome  some  of  the  deficiencies,  many  application  systems  rely 
on  processing  more  information  than  necessary  with  the  hope  that  the  redundancy  will  help  improve  the  quality. 
This  raises  the  question  about  how  the  accuracy  of  the  solution  is  related  to  the  amount  of  information  processed 
by  the  algorithm.  Can  we  define  the  accuracy  of  the  solution  precisely  enough  that  we  automatically  recognize 
situations  where  the  quality  of  the  data  is  so  bad  that  even  a  large  number  of  additional  observations  will  not  yield 
the  desired  solution?  This  paper  proposes  an  information  theoretic  criterion  for  evaluating  the  quality  of  a  3D 
reconstruction  in  terms  of  the  statistics  of  the  observed  parameters  (i.e.  the  image  correspondences).  The  accuracy 
of  the  reconstruction  is  judged  by  considering  the  change  in  mutual  information  (or  equivalently  the  conditional 
differential  entropy)  between  a  scene  and  its  reconstructions  and  its  effectiveness  is  shown  through  simulations. 
A  brief  discussion  on  the  applicability  of  information  theoretic  criteria  for  other  vision  algorithms  concludes  the 
paper. 


1  Introduction 

Obtaining  accurate  3D  models  from  video  using  the  structure  from  motion  (SfM)  approach  [1],  [2],  is  extremely  im¬ 
portant  because  of  its  diverse  applications,  ranging  from  multimedia  to  medical  diagnosis.  Yet  the  quality  of  many  of 
the  automatic  3D  reconstructions  leave  much  to  be  desired.  This  has  led  many  researchers  to  analyze  the  sensitivity, 
robustness  and  statistical  error  characterization  of  the  existing  algorithms,  trying  to  understand  algorithm  behavior 
and  the  characteristics  of  the  natural  phenomenon  that  is  being  modeled  [3],  [4],  [5],  [6],  [7],  [8],  [9].  To  overcome 
these  errors,  the  tendency  has  been  to  add  redundancy  in  the  information  processed.  This  raises  the  question  as  to 
how  the  redundant  information  affects  the  quality  of  the  fi  nal  solution.  In  this  paper,  we  consider  the  situation  where 
multiple  reconstructions  of  the  same  scene  are  available  (called  intermediate  or  individual  reconstructions,  in  this 
paper),  that  are  combined  together  to  obtain  the  fi  nal  estimate  (Figure  (1)).  We  compute  the  incremental  mutual 
information  between  the  unknown  3D  structure  and  increasing  numbers  of  intermediate  reconstructions. 

Before  proceeding  to  give  a  detailed  description  of  the  idea,  we  would  like  to  draw  the  attention  of  the  reader 
briefly  to  the  area  of  model  selection  in  statistics  (AIC,  BIC,  MDL  etc.  [10]).  The  idea  of  fi  tting  models  to  geometric 
data  was  formalized  by  Kanatani  using  a  Geometric  Information  Criterion  (GIC)  [11].  However,  a  large  number  of 
SfM  algorithms  are  not  model  based;  they  reconstruct  individual  point  features  of  the  scene.  Our  work  tries  to  deli  ne 
the  quality  of  reconstruction  from  point  features  in  information  theoretic  terms.  We  also  provide  a  discussion  on  the 
usefulness  of  information  theoretic  measures  for  evaluating  computer  vision  algorithms. 
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Figure  1 1  Block  diagram  representation  of  the  reconstruction  framework.  X  is  the  inverse  depth  that  we  want  to  estimate,  (H(l), H(L)) 
are  the  intermediate  reconstructions  (e.g.  from  each  individual  camera),  and  X  is  the  fi  nal  fused  estimate. 

2  An  Information  Theoretic  Criterion  for  3D  Reconstruction 

2.1  Problem  Formulation 

We  assume  that  all  the  depth  values  are  aligned  to  a  common  frame  of  reference.  Feature  points  will  be  represented 
by  subscripts,  separate  reconstructions  will  be  within  parenthesis.  The  vector  of  estimates  of  the  inverse  depth  1 
[Hi(  1), ...,  Hi(N)]'  will  be  denoted  by  H-^.  The  boldface  notation  H(«)  will  represent  all  the  features  in  the 
reconstruction.  The  final  estimate  X  of  X  =  \Xi, ...,  Xm]'  is  obtained  by  fusing  the  individual  reconstructions 
(H(l), ....  H(L)).  To  keep  the  notation  simple,  the  subscript  for  the  feature  point  will  not  be  mentioned,  unless 
required.  The  individual  estimates  are  modeled  as 

H{i)  =X  +  V(i)  (1) 

where  X  is  the  inverse  depth  value  of  the  particular  feature. 


2.2  Main  Result 

We  will  now  present  an  information  theoretic  measure  for  evaluating  the  quality  of  a  3D  reconstruction  algortihm  by 
analyzing  the  contribution  of  each  of  the  individual  reconstructions.  Our  entire  analysis  is  for  a  particular  point  and 
thus  the  subscript  will  be  dropped,  unless  required  for  clarity.  Our  criterion  for  evaluating  the  quality  of  reconstruc¬ 
tion  depends  on  estimating  the  difference  in  mutual  information  for  the  two  sets  of  observations,  H ' L ;  and  H 1 L  1 1 . 
We  term  this  as  the  incremental  mutual  information  (IMI),  i.e. 

A/(L)  =  /(I,H(i))-/(X,H(i-1)).  (2) 

The  term  gives  us  an  idea  of  the  contribution  of  the  ifi1  observation  to  the  reconstruction  strategy  with  respect  to  the 
previous  (L  —  1)  observations.  As  the  number  of  observations  increase,  the  effect  of  an  additional  observation  de¬ 
creases  and  approaches  zero  in  the  limit.  In  order  to  be  assured  that  the  reconstruction  quality  is  actually  improving, 
we  need  to  consider  only  those  situations  where  the  mutual  information  I(X,  H !  L 1 )  is  non-decreasing.  This  ensures 
that  we  remove  cases  where  the  reconstruction  is  actually  getting  worse,  and  further  observations  are  not  improving 
it  any  more. 

Using  the  relationship  between  mutual  information  and  entropy,  it  is  possible  to  obtain  a  different  interpretation 
of  the  IMI.  Denoting  by  h(X)  the  entropy  of  the  random  variable  X ,  we  know  that  [12]  I(X-,Y)  =  h(X)  +  h(Y)  — 
h(X,  Y).  Thus  A I(L)  in  (2)  can  be  written  as 

A I{L)  =  /(A;H(L))  - =  /i(X|H(I,_1))  - /i(X|H(i)).  (3) 

The  quantity  defi  ned  as  the  IMI  can  also  be  referred  to  as  the  incremental  conditional  entropy.  Since  entropy  of  a 
random  variable  is  a  measure  of  its  uncertainty,  A I  measures  the  reduction  in  the  uncertainty  as  we  add  an  extra 

1  The  inverse  depth  is  used  throughout  this  paper  since  it  is  the  quantity  that  is  estimated  from  the  SfM  equations  for  reconstruction  from  a 
video  and  its  statistics  can  be  obtained  in  an  analytic  form  more  easily  than  for  the  depth. 
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observation.  Since  the  IMI  tends  to  zero  in  the  limit,  the  difference  in  the  conditional  entropy  also  approaches  zero. 
Thus  we  will  consider  more  and  more  images  from  the  video  sequence  till  the  uncertainty  in  the  fi  nal  structure 
estimate  can  be  reduced  no  further.  This  is  the  intuitive  idea  behind  our  criterion  in  (2). 

The  rate  at  which  the  IMI  decreases  is  also  an  important  measure  of  the  progress  of  the  algorithm.  An  extremely 
slow  rate  of  fall  indicates  that  more  images  will  be  necessary  to  achieve  an  acceptable  level  of  quality.  Since  there 
is  motion  between  adjacent  frames  of  the  video,  a  particular  point  will  move  out  of  the  fi  eld  of  view  of  the  camera 
after  a  certain  amount  of  time.  A  very  slow  rate  of  fall  of  AJ  might  mean  that  the  quality  of  the  reconstruction  is  not 
good  enough  even  when  the  point  is  no  longer  visible.  The  rate  of  change  of  A I  can  be  obtained  as 

A 2I(L)  =  A I(L)  -  A I{L  -  1) 

=  /(A,H(i))  +  /(X,H(L“2))-2/(A,H(i”1)).  (4) 

Combining  (2)  and  (4),  we  can  state  that  an  acceptable  reconstruction  quality  has  been  achieved  when  / ( X,  H!:  L> ) 
is  non-decreasing  and  the  following  conditions  are  satisfi  ed  simultaneously: 

A 2I{L)  <  0,  VL  >  Lq, 

A  I(L)  <  t,  (5) 

where  L0  is  a  constant  and  r  is  a  threshold  defi  ning  an  acceptable  quality  of  reconstruction.  Since  A I  (L)  is  mono¬ 
tone  non-increasing  for  L  >  Lq  and  is  bounded  below  by  zero,  the  monotone  convergence  theorem  [13]  applied  to 
(3)  implies  that  — >  h(X  )  — +  ho  for  some  L  >  Lq.  Thus,  ho  is  the  minimum  level  of  uncertainty 

in  a  scene  described  by  L  observations. 

Since  the  criterion  does  not  depend  on  how  the  intermediate  reconstructions  are  obtained,  it  is,  in  principle, 
independent  of  the  3D  reconstruction  strategy.  However,  the  procedure  for  estimation  of  IMI  may  be  optimized  for 
a  particular  algorithm.  Details  on  the  estimation  process  can  be  found  in  [14]. 


2.3  IMI  Computation  Under  Gaussian  Distributions 


Assume  that  X 
distributed  as  7V(0, 


A/”(0,cr2  =  Px)  and  {V(i),i 
<?v (*))•  Let  pv  =  diag  [pv(i)\i=1. 


1  is  a  sequence  of 


„JV 


=  diag 


V(i)> 


'V(N) 


independent  random  variables 

1  2 


From  (1),  E[H(i)\  =  0  and 


mmm  =  Eux+vmx+vm 

=  Px+Pv(i)8ij,  (6) 

where  5ij  is  a  Kronecker  delta  function.  Thus  the  covariance  of  H 1  N *  is  Pjj(jv)  =  1  +  1  !\rPxlJ,,  where  ljv  is 

a  vector  of  N  ones.  Then  the  mutual  information  between  X  and  H(i), 


I(X;  H(i))  =  h(H(i))  -  h(H{i)\X) 


=  i  log  (  1  + 


P 


x 


Pv(l)J  ' 


(7) 


Next,  consider  the  mutual  information  between  the  unknown  X  and  the  vector  of  observations  H^N\  We  will  denote 
by  \K\  the  determinant  of  a  matrix  K. 

I{X- H(JV))  =  /t(Hw)-/t(HW|I) 

N 

=  h( H(a°)  -  ^2  \  l°g(27r ePv(i)) 


i= 1 


(b) 


=  2l0S 


l-FV  +  1nPx1tn\ 
\Pv\ 


(8) 


2  Where  necessary  to  distinguish  a  particular  feature  point,  we  will  use  the  notation  cr^j  an(^  F*Vj  (i)  or  0y.  ^  f°r  the  point. 
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(a)  is  a  result  of  applying  the  chain  rule  of  entropy  and  substituting  the  expression  for  the  differential  entropy 
of  a  Gaussian  random  variable  [12];  (b)  is  due  to  the  fact  that  \Py\  =  nr=i  Pvd)  =  ir=  =iav(i)-  Using  the 


method  of  induction  and  the  properties  of  determinants,  it  can  be  shown  that  |Py 

Then  from  (8),  the  expression  for  the  mutual  information  becomes 


1/vPyl 


N^X*-n\ 


=  a= 


1  °V(») 


N 


7(X;HW)  =  ilog  1  +  E 


'  x 


i= 1  av(i) 


(9) 


Let  us  compute  the  difference  in  the  mutual  information  for  the  two  sets  of  observations,  H '  N :  and  H!  N  1 : .  We 
shall  call  this  the  incremental  mutual  information,  A I.  Thus, 


A I  =  7(X;H(JV))  -  7(Jf;H(JV_1)) 

|Uy(JV)  +  IatPa'1 


5  log 

5  log 


IP 


y(iV-l)  I 


|Py(N-i)  +  1at_iPa:1^_1  |  |-P\/(JV)| 


( nti 


=1  av(i) 


•  N  r-riv 

;  Ai=l  1  li=i 


’v(j) 


T-riV  o  i  9  — 1  TT-^  2 

i  ni=i  Ko) + ax  Ei= i  rij=i  Ku) 


I  log  I  1+ _ 1/tT^Jvl 

<t2  '  Ai= i 


'V(i) 


=  5  log  1  + 


l/Py(7V) 


l  i 

Aa— 1  Pv(j) 


(10) 


Equation  (10)  gives  us  a  measure  of  the  extra  information  that  would  be  obtained  by  including  an  additional  obser¬ 
vation  into  the  fusion  process.  Also,  since 


7(X;  H(JV))  -  /(X;  H(Ar_1))  =  -  /i(X|H(JV)), 


(11) 


the  quantity  defi  ned  as  the  incremental  mutual  information  can  also  be  referred  to  as  the  incremental  conditional 
entropy.  Thus  we  are  measuring  the  reduction  in  the  uncertainty  of  the  solution  as  we  consider  an  extra  observation. 
The  difference  in  the  differential  entropy  determines  the  decrease  in  the  coding  length  of  the  scene  structure  as  the 
number  of  observations  increases  [12], 

The  above  calculation  requires  computing  the  variances  of  the  intermediate  reconstructions.  Any  method  to 
compute  them  is  perfectly  suitable.  In  an  earlier  work  [15],  we  have  shown  how  to  do  this  for  the  case  of  3D  recon¬ 
struction  using  optical  flow.  It  should  be  remembered  that  all  the  geometric  quantities  have  to  be  with  respect  to  a 
particular  frame  of  reference;  hence  it  may  be  necessary  to  transform  the  variances  appropriately. 


An  Estimation  Theoretic  Interpretation:  We  will  now  present  an  alternative  interpretation  of  the  result  in  (10) 
from  an  estimation  theoretic  perspective.  The  mean  squared  distortion  is  defi  ned  as 


P(X,X) 


1  M 


(12) 


Let  p(Xj,  77,  (1), ...,  Hj(N))  denote  the  joint  density  function  of  the  parameter  and  observations.  The 
error  estimator  X,  of  Xj,  obtained  from  H(JV),  is  Xj(N)  =  E 


Xi\H> 


(N) 


From  the  Cramer-Rao 


mean  square 
lower  bound 
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(CRLB)  we  can  write  the  following  set  of  inequalities. 


D  > 


> 


A 


1  M 
—  Y 

M  ^ 


3=1 

M 


-Y 

M  ^ 


3= 1 


1 

1 

Y  +  ZZi  E  [-^  log p{Hj(i)\X)] 


1 


1 


l  sym  i 
M  2^j= 1  Dj  (N) 


(13) 


The  last  step  is  a  result  of  the  application  of  Jensen’s  inequality  [16]  and  the  fact  that  E 


-m^bgp(Hj{i)\x) 


-pYji)-  Recalling  that  (10)  is  for  a  particular  feature  point  where  the  subscript  has  been  suppressed  for  clarity  of 
notation,  let  us  denote  A Ij=I (Xj ;  )  —  I (Xj ;  ^ ).  Then  from  (13)  and  the  last  expression  of  (10),  we  get 


AIj  =  l  log 


(dan-  m 

\  Dj(N)  )■ 


(14) 


Alternatively,  the  innovations  at  the  stage,  7^  =  Xn  —  Xn-  Then  following  the  standard  derivation  for  the 
Kalman  fi  Iter  [16],  it  can  be  shown  that  variance  of  the  innovations 


P  —  n~ 

rlN  —  aV(N) 


1  + 


iA 


V(N) 


J_  4.  spN-1 
a2  '  Z^i= 1 


V(i) 


which  shows  that,  for  each  feature  point,  the  incremental  mutual  information  is  related  to  P, 


7  N 


(15) 


AJ=|log(^-).  (16) 

\aV(N)J 

These  relationships  provide  an  alternative  estimation  theoretic  interpretation  to  our  result.  Taken  together  (10),  (14) 
and  (16)  demonstrate  the  use  of  statistical  evaluation  techniques  to  the  SfM  problem,  when  it  is  suitably  formulated. 


3  Analysis  and  Experiments 

3.1  Analysis: 

Present  methods  to  evaluate  the  quality  of  a  reconstruction  involve  computing  the  distortion  in  (12).  For  a  fusion 
algorithm,  this  means  that  we  need  to  compute  (12)  at  every  stage  of  the  fusion  and  decide  when  to  stop.  This  is 
computationally  intensive,  distortion  measures  are  not  always  very  useful  in  practical  experiments  since  the  choice  of 
an  acceptable  threshold  if  often  arbitrary  and  the  source  of  the  error  (whether  in  the  intermediate  reconstructions  or 
in  the  fusion  algorithm)  is  diffi  cult  to  identify.  In  our  approach,  (10)  gives  a  direct  way  to  measure  the  contribution 
of  the  intermediate  solutions  and  the  accuracy  of  the  fi  nal  solution  as  the  algorithm  progresses.  The  statistics  of 
the  error  can  be  computed  using  the  SfM  equations  and  its  solutions,  as  described  in  [15],  If  the  solution  is  far 
from  its  desired  values,  the  error  would  be  larger  than  if  the  solution  is  close  to  its  true  value.  When  the  error  in 
the  intermediate  reconstructions  is  small,  Dj  is  small  and  hence  the  difference  in  the  mutual  information  is  small. 
Ideally,  this  difference  should  go  to  zero  as  we  include  more  and  more  observations.  If  the  error  is  large,  Dj  would 
be  large  and  A Ij  would  not  decrease  appreciably  with  the  number  of  observations.  Another  salient  feature  of  our 
method  is  that  we  measure  the  information  content  between  the  true  structure  and  the  reconstructions  before  the 
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fusion.  This  allows  us  to  understand  the  source  of  the  error  better  since  the  effect  of  intermediate  reconstructions 
and  fusion  algorithm  are  separated. 

One  scenario  where  this  idea  can  be  applied  is  reconstruction  from  a  video  sequence  where  intermediate  re¬ 
constructions,  H(l), ...,  H(L),  obtained  from  a  few  frames  (two  or  three)  are  combined  together.  Another  appli¬ 
cation  would  be  where  partial  reconstructions  have  been  obtained  from  multiple  cameras  3.  These  partial  models 
would  have  common  overlapping  regions  which  can  be  combined  together  to  form  the  single  estimate.  In  this  case, 
H(l), ...,  H(L)  would  represent  these  common  sub-regions  from  L  separate  reconstructions. 

The  statistical  assumptions  of  independence  and  Gaussianity  are  necessary  in  order  to  derive  closed  form  ex¬ 
pressions  for  the  quantities  of  interest.  The  independence  of  the  intermediate  estimates  7T(1), ...,  H(L)  may  be 
valid  when  these  are  obtained  from  separate  imaging  systems  and  then  combined.  When  the  same  camera  is  used, 
the  intermediate  reconstructions  should  be  obtained  with  non-overlapping  frames;  otherwise  the  common  frames 
increase  the  dependencies.  Regarding  the  Gaussianity  assumptions,  it  has  been  pointed  out  by  Zhang  in  [7]  that  the 
correspondence  errors  in  SfM  are  usually  normally  distributed,  if  we  can  get  rid  of  the  outliers  in  the  matches. 


3.2  Experiments: 

Experiment  1:  A  set  of  3D  points  were  generated  so  that  we  know  their  true  positions.  The  perspective  projections 
of  these  points  were  generated  and  Gaussian  noise  with  zero  mean  and  known  variance  was  added  to  these  2D 
locations.  The  projections  were  taken  for  different  positions  of  the  camera,  so  that  in  the  end  a  set  of  tracked  features 
was  obtained.  From  every  pair  of  such  tracked  features,  the  positions  of  the  original  3D  points  were  estimated, 
which  results  in  a  set  of  3D  reconstructions.  The  fi  rst  plot  of  Figure  2(a)  shows  the  true  value  of  the  3D  points  and 
their  estimated  reconstruction  from  all  the  frames  over  which  the  features  could  be  tracked.  4  The  second  diagram 
in  Figure  2(a)  plots  the  decrease  in  the  incremental  mutual  information  with  the  increasing  number  of  intermediate 
reconstructions. 

Experiment  2:  As  in  the  previous  simulation,  a  set  of  features  were  tracked  over  a  number  of  frames.  However,  the 
level  of  noise  added  to  the  feature  positions  was  higher  and  it  led  to  a  mismatch  of  some  of  the  features.  The  3D 
positions  of  the  points  were  estimated  using  the  SfM  algorithm  and  the  results  were  erroneous  as  is  clear  from  the 
fi  rst  plot  of  Figure  2(b).  The  second  plot  of  Figure  2(b)  depicts  this  case  where  the  incremental  mutual  information 
remains  large  and  does  not  follow  any  trend. 

Experiment  3:  We  will  now  present  our  result  on  a  real  video  sequence.  The  video  consists  of  a  person  moving  his 
head  in  front  of  a  static  camera.  The  aim  was  to  reconstruct  the  model  of  the  head  of  the  person  from  this  video. 
The  focal  length  of  the  camera  was  known.  Figure  (3)(a)  represents  an  image  from  the  video  along  with  some  of 
the  feature  points  which  were  tracked.  Figure  (3)(b)  represents  the  change  in  the  incremental  mutual  information 
between  the  unknown  3D  structure  and  the  intermediate  reconstructions  from  every  pair  of  frames.  Based  on  this 
measure,  the  3D  model  was  reconstructed  using  25  frames  and  Figure  (3)(c)  shows  one  particular  view  of  this  model. 


4  A  Discussion  on  the  Usefulness  of  an  Information  Theoretic  Criterion  for 
Vision  Algorithms 

The  statistical  quality  analysis  of  computer  vision  algorithms  has  been  studied  quite  extensively  (see  [14]  for  a 
detailed  literature  survey  on  this  topic).  However,  most  of  the  methods  have  relied  on  computing  the  second  order 
statistical  moments,  like  covariance  of  the  estimate.  The  covariance  is  a  preferred  measure  because  of  its  relation  to 
the  Cramer-Rao  lower  bound  (CRLB),  which  dictates  the  minimum  variance  that  an  estimator  can  achieve  [16],  If 
the  variance  of  a  sequence  of  estimates  (say,  of  the  3D  structure)  tends  towards  the  CRLB,  then  the  estimate  is  said 
to  be  asymptotically  effi  cient.  However,  computation  of  the  CRLB  often  assumes  that  the  estimate  is  unbiased  (see 
[6]).  This  is  because,  computing  the  bias  of  an  estimator  is  not  an  easy  task.  Hence,  even  though  expressions  exist 

3  This  is  the  set-up  in  the  “Eye  Vision”  technology  developed  by  Carnegie  Mellon  University  (CMU)  and  CBS  Television 
(http  ://w  w  w.ri .  emu .  edu/events/ sb3  5/tksuperbowl .  html) . 

4The  fi  rst  point  was  used  to  set  the  scale  of  the  reconstruction,  so  that  the  geometric  indeterminacies  do  not  affect  the  result. 
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True  and  Estimated  Depth  Values  True  and  Estimated  Depth  Values 


Plot  of  Incremental  Mutual  Information  Plot  of  Incremental  Mutual  Information 


(a)  (b) 

Figure  2:  (a):  The  upper  plot  shows  the  true  value  of  the  depth  of  the  3D  points  using  the  solid  line  and  the  fused  estimate  from  the  intermediate 
reconstructions  from  all  the  frames  using  the  dotted  lines.  The  second  diagram  plots  the  decrease  in  the  incremental  information  with  the 
increasing  number  of  frames,  (b):  The  upper  plot  shows  the  true  value  of  the  depth  of  the  3D  points  using  the  solid  line  and  the  fused  estimate 
from  the  intermediate  reconstructions  from  all  the  frames  using  the  dotted  lines.  The  lower  plot  is  the  change  in  the  mutual  information  with 
increasing  number  of  frames.  This  is  the  case  where  the  estimated  reconstruction  does  not  converge  to  the  true  value  even  with  increasing 
observations. 


Figure  3 :  The  above  fi  gures  represent  a  3D  reconstruction  from  video  using  the  method  of  measuring  the  IMI  to  judge  the  quality  of  the  result, 
(a)  is  one  of  the  images  from  the  video  along  with  the  set  of  tracked  features  used  for  the  reconstruction,  (b)  represents  the  change  in  the  IMI  with 
the  number  of  images;  (c)  depicts  one  view  from  the  reconstructed  model. 


for  the  CRLB  of  a  biased  estimator  (known  as  the  generalized  CRLB),  it  is  rarely  used.  The  other  main  objection  to 
the  use  of  variance  as  a  measure  of  quality  is  that  it  neglects  the  effect  of  higher  order  statistics.  This  is  often  a  major 
approximation  because  the  outliers,  which  are  the  source  of  many  problems  in  computer  vision  algorithm,  are  often 
not  modeled  accurately  by  second  order  statistics. 

Recent  work  [17,  18]  has  shown  that  the  motion  and  depth  estimates  are  statistically  biased,  and  the  bias  is 
signifi  cant.  This  bias  often  propagates  through  later  stages  of  the  computation  that  rely  on  the  motion  and  depth 
estimates.  Also,  as  we  have  shown  in  [15],  the  noise  in  the  SfM  estimates  is  signifi  candy  non-Gaussian.  Hence  we 
propose  that  an  information  theoretic  criterion  which  works  by  estimating  the  probability  distribution  function  (pdf) 
of  the  concerned  physical  quantities  (e.g.  the  depth),  rather  than  concentrate  on  certain  moments  only,  is  a  more 
suitable  measure  for  a  number  of  vision  problems.  The  method  of  estimating  the  pdf  will  depend  upon  the  particular 
algorithm  and  underlying  assumptions.  The  major  limitation  of  an  information  theoretic  criterion  is  its  efficient, 
robust  and  accurate  estimation.  This  is  because  it  is  often  diffi  cult,  and  computationally  expensive,  to  estimate  the 
probability  density  functions  of  the  parameters  of  interest.  However,  estimation  of  MI  has  received  some  attention 
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among  researchers  in  signal  processing  and  information  theory  [19].  It  is  our  hope  that  such  information  theoretic 
criteria,  as  proposed  in  this  paper,  will  become  practically  applicable  as  progress  is  made  on  robustly  estimating 
them. 


5  Conclusion 


In  this  paper,  we  have  introduced  a  method  to  evaluate  the  quality  of  3D  reconstruction  from  a  video  sequence. 
Existing  methods  rely  on  computing  the  distortion  between  the  projections  of  the  reconstructions  and  the  original 
images  and  deciding  that  the  reconstruction  is  of  acceptable  quality  when  the  distortion  is  below  a  certain  empirically 
chosen  threshold.  In  this  paper,  we  have  shown  that  it  is  possible  to  evaluate  the  quality  of  the  3D  structure  estimate 
as  the  algorithm  proceeds  by  computing  the  incremental  mutual  information,  which  determines  the  importance  of 
considering  an  additional  observation.  It  is  related  to  the  decrease  in  the  coding  length  of  the  actual  structure 
conditioned  on  the  increasing  number  of  observations.  Finally,  experimental  results  have  been  provided  to  justify 
these  claims. 
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